Strengths and limitations of the minimum evolution principle.

نویسندگان

  • O Gascuel
  • D Bryant
  • F Denis
چکیده

The idea of inferring phylogenies by selecting trees that minimize the total tree length can be traced back to the 19th century and the mathematician Jakob Steiner. It complies with Occam’s principle of scientiŽc inference, which essentially maintains that simpler explanations are preferable to more complicated ones and that ad hoc explanations should be avoided. Parsimony methods, which infer phylogenies directly from character data, are a well-known example of this approach. They search for the tree that requires the minimum number of mutational changes to explain the evolutionary change in the sequences studied.With evolutionary distance data, the deŽnition of simplicity is less obvious. We must Žrst decide how the branch lengths are to be estimated and then how the tree length is to be calculated from these branch lengths. In practice, branch lengths are usually estimated within the least-squares framework. Several different least-squares methods are available to choose from, each using a differentmodel for variances and covariances of the observed distances. Several deŽnitions of tree length have also been proposed, differing from one another by the treatment of negative branch lengths. We shall discuss branch length estimation Žrst (see also Searl, 1971; Bulmer, 1991; Swofford et al., 1996) and then the various deŽnitions of tree length. Let ±i j be the estimate of the evolutionary distance between taxa i and j , obtained from sequences or any other data, and let1D (±(i j )) be a column vector containing all the ±i j estimates, with (ij) denoting the index of the pair i , j . Let T be the tree being studied, di j the distance induced by T between taxa i and j (i.e., d(i j) is equal to the length of the path connecting i to j in T ), and D D (d(i j )) a column vector containingall rankeddi j distances.Using matrix notation, the branch lengths of T can be represented by a column vector B D (bk ) with bk denoting the length of branch k, whereas the topology ofT canbe represented by a 0-1 matrix A D (a(i j )k ) such that (a(i j )k ) is equal to 1 if the branch k lies on the path connecting i and j , but is equal to 0 otherwise. With this notation we have D D AB , and the branch lengths are estimated by minimizing thedifference between theobservation1and D. The ordinary least-squares (ordinary-LS) approach involves minimizing the squared Euclidean Žt between 1 and D, that is, (D – 1)T (D – 1), which yields B D (AT A)¡1AT1. However, this approach implicitly assumes that each ±i j estimate is independent and has thesamevariance,which isnot generally true because of the common evolutionary history of the sequences (or molecules) in question, and because large distances are much more variable than short distances. So, we often use weighted least-squares (weightedLS), that is, (D – 1)TV¡1(D – 1), where V is the diagonal matrix containing the variances of the ±i j estimates. This yields B D (ATV¡1A)¡1ATV¡11. Weighted-LS accounts for the variable variance of the estimates but not for their dependencies. The minimum variance and hence most reliable branch length estimates are obtained by generalized least-squares (generalized-LS), the formula for which is identical to that of weighted-LS except that V now equals the full variance– covariance matrix of the ±i j estimates. However, generalized-LS is rarely used because the full V matrix is usually poorly known, and because the inversion of V requires a lot of computing time. Ordinary-LS is a special case ofweighted-LS, which is obtainedwhen all variances are equal, whereas weightedLS is a special case of generalized-LS, corresponding to the case in which all covariances are null. Minimization of these criteria sometimes gives branch lengths with negative values, which does not correspond to any biological process. The general approach for dealing with this problem is nonnegative least-squares regression (Lawson and Hanson, 1974), which applies to generalized-LS and thus to weighted-LS and ordinary-LS. Several algorithms (e.g.,

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Size Evolution Study of the Electronic and Magnetic Properties of MgO Nanoclusters

Magnesium oxide nanoclusters have attracted much attention due to their potential applications to catalysis and novel optoelectronic materials. In the present study, we have studied the electronic and magnetic properties of the stoichiometric magnesium oxide nanoclusters (MgO)n  for n = 2-20. Although the binding energy increases with the size of the cluster, it  re...

متن کامل

Pontryagin's Minimum Principle for Fuzzy Optimal Control Problems

The objective of this article is to derive the necessary optimality conditions, known as Pontryagin's minimum principle, for fuzzy optimal control problems based on the concepts of differentiability and integrability of a fuzzy mapping that may be parameterized by the left and right-hand functions of its $alpha$-level sets.

متن کامل

Syllable structure in Old, Middle and Modern Persian: A contrastive analysis

Evolution of languages has always been of interest to linguists.  In this paper we study  the natural progress of the syllable structure from Old  Persian  (O.P)  to Middle Persian (Mi.P) and up to the Modern Persian (Mo.P). For this purpose all the words containing consonant sequences are collected from specific sources of each  of these  languages,  and then  analysed  according to the syllab...

متن کامل

Dynamic Load Carrying Capacity of Flexible Manipulators Using Finite Element Method and Pontryagin’s Minimum Principle

In this paper, finding Dynamic Load Carrying Capacity (DLCC) of flexible link manipulators in point to-point motion was formulated as an optimal control problem. The finite element method was employed for modelling and deriving the dynamic equations of the system. The study employed indirect solution of optimal control for system motion planning. Due to offline nature of the method, many diffic...

متن کامل

On Feasibility of Adaptive Level Hardware Evolution for Emergent Fault Tolerant Communication

A permanent physical fault in communication lines usually leads to a failure. The feasibility of evolution of a self organized communication is studied in this paper to defeat this problem. In this case a communication protocol may emerge between blocks and also can adapt itself to environmental changes like physical faults and defects. In spite of faults, blocks may continue to function since ...

متن کامل

تحول اصل صلاحیت واقعی در لایحۀ جدید مجازات اسلامی

  The primary principle of international Criminal law is territorial principle. However, in several special cases, Countries tend to extend their local territory of criminal law to extra territorial Jurisdiction. It means that, if providing commitment crime out of their territory of autonomy, they will decide that their provision and courts to consider the crime. For example, when occurring a c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Systematic biology

دوره 50 5  شماره 

صفحات  -

تاریخ انتشار 2001